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ABSTRACT 


Constrained  minimum  discrimination  information  methods  provide  a 
basis  for  a  unified  approach  to  a  wide  range  of  problems  in  marketing 
research.  For  instance,  they  lead  to  characterizations  parallel  to  those 
of  the  Hendry  system  and  other  entropic  approaches,  with  greater  economy  of 
assumptions.  Goodness-of-fi t  tests  and  a  structure  for  decision  modelling 
are  supplied  from  the  same  basic  models  with  a  range  of  applications  that 
include  market  segmentation  and  brand  shifting  choices.  Other  probabilisti 
models  of  marketing  choice  (logit,  MCI,  etc.)  are  also  comprehended  in 
ways  that  resolve  many  logical  and  computational  difficulties  in  these 
other  approaches. 
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Unifying  Market  Research 

Market  research  has  become  increasingly  complex.  This  characterization 
applies  to  methods  of  analysis  as  well  as  their  areas  of  application.  An 
effort  at  unifying  these  proliferating  tools,  techniques  and  concepts  would 
thus  seem  to  be  in  order. 

It  is  the  purpose  of  this  paper  to  provide  a  basis  for  such  unifica¬ 
tion  in  ways  which  also  serve  to  increase  the  power  of  these  disparate 
developments.  Thus,  the  proposed  methodological  (and  conceptual)  unification 
is  to  be  attained  in  a  way  that  will  allow  these  various  developments  to 
continue,  when  they  are  applicable,  but  also  to  provide  an  underlying 
conceptual -methodological  framework  with  which  to  relate  them  to  each  other. 

Our  approach  to  the  proposed  unification  will  be  via  "information 
theory".  In  particular,  we  shall  use  what  is  called  the  "information  statistic" 
and  show  how  different  models  and  methods  which  are  commonly  used  in  various 
parts  of  marketing  can  be  related  to  this  one  statistic.  In  particular 
we  shall  show  how  both  decision  theoretic  as  well  as  classical  statistical 
methods  can  thereby  be  related.  The  latter  will  include  classical 
techniques  of  regression  and  correlation,  as  used  in  marketing,  as  well  as 
more  recent  variants  such  as  "logit"  and  "probit"  analysis,  etc.  It 
follows  that  a  single  consistent  basis  will  thereby  also  be  supplied  for 
unifying  the  research  approaches  to  different  marketing  areas  such  as 
brand  switching,  market  segmentation,  store  location  and  market  areas,  etc. 
Conversely  the  availability  of  an  efficient  unified  aporoach,  such  as  we 
will  be  suggesting,  will  also  supply  a  basis  for  reviewing  past  results  in 
the  light  of  new  alternatives  that  will  thereby  be  brought  into  view. 
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Unifying  Statistics 

As  far  back  as  1925,  R.A.  Fisher* **  advanced  the  basic  notion  that 
the  discipline  of  statistics  could  be  regarded  as  being  concerned  with 
information  and  its  measurement.  For  the  normal  distribution  with  which 
he  was  then  concerned,  Fisher  showed  that  the  reciprocal  of  the  variance 
provides  a  measure  of  the  average  amount  of  information  supplied  by  each 
unit  in  a  sample  for  estimating  the  corresponding  population  mean. 

Thus,  the  observations  in  a  sample  with  a  large  variance  communicate 
less  information  (per  observation)  than  would  be  the  case  for  a  sample 
with  a  smaller  variance. 

Drawing  upon  concepts  from  classical  physics,  Claude  Shannon  and 
Norbert  Wiener*** **** ★★★★*  (circa  1949)  developed  a  measure  of  information  f.or 
messages  connmunicated  in  the  form  of  binary  digits  (BITS)  as  in,  for 
instance,  a  modern  digital  computer.  Theirs  was  a  probability  based 
approach,  however,  and  therefore  seemed  to  differ  from  Fisher's  statistics 
based  approach.  Kullback  and  Leibler  in  1951,  however,  took  a  different 
tack.  They  developed  the  statistical  properties  of  this  information 

measure  and  showed  the  applicability  of  what  is  now  called  the  "Kullback- 
Leibler  statistic"  to  a  wide  variety  of  statistical  problems  and  methods 
--including  the  devlopments  of  R.A.  Fisher. 

This  work  by  Kullback  (1959)  and  by  Kullback  and  Leibler  (1951) 

kkititit 

also  supplied  a  basis  for  still  further  progress.  Akaike  (1973)  , 

*See  Fisher  (1935). 

**We  are  here  giving  only  a  rough  characterization  of  Fisher's 
expression  of  his  thoughts.  For  full  detail,  see  Fisher  (1935). 

***See  Shannon  and  Weaver  (1949).  See  also  Khinchine  (1957). 

****These  topics  are  discusses  in  Kullback  (1959). 

★★★★*See  Akaike  (1973),  (1977)  and  (1978).  See  also  Sawa  (1978). 


for  instance,  was  able  to  show  that  the  Kullback-Leibler  statistic  could  be 
used  to  unify  supposedly  separate  parts  of  statistics  such  as  decision 

theory  and  classical  (maximum  likelihood)  approaches.  He  was  also  able  to 
resolve  a  variety  of  paradoxes  and  to  deal  with  open  questions  such  as  the 
number  of  terms  to  include  in  a  regression  or  a  factor  analysis  in  a  precise 
statistical  manner. 

This  all  suggests  that  the  end  of  these  developments  is  still  not 
in  sight.  It  also  suggests  that  our  proposed  basis  for  unification  will 
better  position  different  parts  of  market  research  to  take  advantage  of 
these  developments  as  they  occur.  In  any  case  the  task  of  the  immediately 
following  sections  will  be  to  exhibit  how  the  proposed  unification  might 
now  be  achieved.  In  addition  we  will  exploit  the  recent  results  of 
A.  Chames,  W.W.  Cooper,  ert.  aK*  which  brings  together  information  theory 
and  mathematical  programming  to  deal  with  policy  evaluation  and  statistical 
inference  in  a  single  model.  This  makes  it  possible  to  study  marketing  plans 
and  management  policies  and  to  evaluate  their  consequences  in  ways  that 
would  not  previously  have  been  possible. 

Although  the  sections  that  follow  will  require  mathematical  notation, 
we  will  supply  references  rather  than  formal  proofs.  After  this  has  been 
done  we  will  return  to  purely  verbal  characterizations  and  interpretations 
in  the  concluding  section  of  this  paper.  This  will  allow  us  to  summarize 
what  we  will  have  covered  and  to  indicate  possible  courses  of  further 
development.  Here,  however,  we  need  to  emphasize  that  only  a  beginning  has 
thus  far  been  made  in  the  proposed  unification  and  much  more  remains  to 
be  done. 

♦See  Charnes  and  Cooper  (1974);  Charnes,  Cooper,  and  Learner  (1978); 
Charnes,  Cooper,  and  Seiford  (1978);  Brockett,  Charnes,  and  Cooper  (1978); 
Phillips  (1980). 


The  MDI  Method 


The  Kullback-Leibler  statistic  may  be  written 

n  P. 

I(p:q)  =  E  p.  In  -I 
i=l  1  qi 

Its  constrained  minimum,  called  the  Minimum  Discrimination  Information  (MDI) 

statistic,  gives  the  information  in  favor  of  the  distribution  P^  >  0, 
n  n 

^2  p^  =  1  as  against  the  distribution  q.  >  o,  zL  q.=l. 

We  elaborate  this  further  as  follows. 

Minimizing  the  information  I (p :q )  for  discrimination  between  the 
probability  distributions  p  and  q,  subject  to  any  constraints  that  may  apply 
to  the  parameters  of  p,  results  in  an  estimate  of  p  which  is  the  distribu¬ 
tion  least  distinguishable  from  q,  but  which  satisfies  the  constraints 
(which  q  itself  may  not  do).  In  many  important  cases,  these  information 
theoretic  estimates  are  maximum  likelihood  estimates,  and  they  are  in  general 
best  asymptotically  normal.* 

An  asymptotic  distribution  theory  of  I*(p:q)  (the  minimum  discrimina¬ 
tion  information  (MDI)  value)  leads  to  a  test  of  the  hypothesis  that  p 
and  q  are  identical,  i.e.  that  the  observed  parameters  are  consistent  with 
the  estimated  parameters.  Estimation  and  hypothesis  testing  are  thus 
achieved  simultaneously.  Since  I ( p : q )  is  a  general  measure  of  the  "distance" 
between  p  and  q,  all  estimates  and  inferred  relationships  resulting  from  a 
constrained  MDI  problem  are  valid  whether  or  not  Hq  is  accepted. 

Noting  the  above,  Chames,  Cooper  and  Learner  (1978)  brought  an  extended 

version  of  MDI  to  the  problem  of  brand  shifting  as  incorporated  in  MCRA's 
SANDDABS  model.  This  extended  version  comprehends  an  approach  to  MDI  under 

*See  Gokhale  and  Kullback  (1978a)  for  a  full  discussion. 
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inequality  constraints  with  a  new  duality  relation  involving  an  especially 

simple  unconstrained  convex  functional  which  can  be  used  to  provide  additional 

★ 

insights  (and  power)  to  MDI  approaches  and  also  to  simplify  the  computations. 

This  also  makes  it  possible  to  regard  the  following  commonly  used 
marketing  models  as  special  cases  of  the  general  MDI  model :  the  loglinear 
model,  the  maximum  entropy  model,  logit  and  probit  models,  as  well  as 
gravity  models  and  "multiplicative  competitive  interaction"  models  of 
individual  choice,  and  others.  We  will  detail  some  of  these  relationships 
in  summary  fashion  as  follows:  First,  we  will  relate  the  MDI  to  Bayes' 
theorem  from  which  its  relation  to  decision  theory  will  be  evident.  From 
the  canonical  MDI  model  we  will  then  derive  each  of  the  implied  marketing 
models  and  indicate  how  the  MDI  hypothesis  testing  capability  makes  these 
models  more  useful  for  management  decision  making.  For  the  sake  of 
brevity,  the  demonstrations  are  confined  to  the  simplest  case  of  each 
subsidiary  model  (for  instance  the  univariate  dichotomous  logit  model), 
but  extensions  are  indicated,  as  well  as  new  variants  of  the  basic  models 
suggested  by  the  MDI  framework. 


♦Further  progress  includes  the  derivation  of  characterizations  of  the 
complete  duality  states (Charnes ,  Cooper  and  Seiford  (1978))  and  relations  to 
other  types  of  statistical  analyses  as  in  Brockett,  Charnes  and  Cooper  (1978) 


Following  Kullback  (1959,  p.4),  we  first  relate  the  MDI  statistic  to 
Bayes'  theorem  (and  hence  to  decision  theory)  by  writing 

p(x)  «  P(xjH1) 

q(x)  ■  Q(x|H2) 


where  p(x)  and  q(x)  are  statistical  distributions  with  components  pi , 

V°-  qi  -  *• 

Here  and  H2  represent  hypotheses  which  associate  the  sample  values 
X  =  x  with  P  and  Q  respectively.  By  Bayes1  theorem. 


P(H1I  x) 
Q(H2|x) 


Q(xjH2)Q(H2)  q(x)Q(H2) 


or 


P(x)  P(Hi|x)  P(HX) 

1n  q(x)  *  ln  Q(H2|  x)  "  ln  q(h2) 


The  expression  on  the  left  is  the  "log-odds"  ratio  for  the 
distribution  p  against  the  distribution  q  on  the  basis  of  X  =  x. 

It  is  evidently  the  difference  between  the  posterior  and  prior  distributions 
in  terms  of  the  logarithms  of  their  ratios. 


Thus  the  statistic 

I(p:q) 


ln 


Pi 


represents  the  expected  value  of  this  gain.  Furthermore  we  can  also 


introduce  the  statistic  which  Kullback  (1959)  refers  to  as  the  "divergence 


! 
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measure",  via 

J(p,q)  “ 


which  represents  a  generalization  of  the  usual  "generalized  distance" 
statistic  of  Mahalanobis.  In  these  terms  I (p :q )  and  I ( q : p)  represent 
what  might  be  called  "directed  divergences"  and  J(p,q)  a  measure  of  the 
divergence  between  Hi  and  H2  on  the  basis  of  X  =  x.  The  latter,  i.e. 
J(p,q)  has  all  the  properties  of  a  distance  measure,  except  that  it 
need  not  satisfy  the  triangle  inequality.*  The  quantity  I ( p : q )  may  be 
referred  to  as  the  Kul lback-Leibler  statistic  or  the  DI  (discrimination 
information)  statistic  in  that  we  have  not  yet  introduced  the  minimization 
principle  for  selecting  p  and  q. 


I(p:q)  +  I(q:p) 

I,  pi  s  +  s  r, 


(pi -q1 1  ln  4  • 


"ft 

See  Appendix  A  in  Charnes  and  Cooper  (1961). 
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The  Canonical  MDI  Problem 

We  refer  to  problems  of  the  followina  form  as  "canonical": 

n 

Minimize  I(p:q)  =  £  P,  ln(p./q.) 

j=l  J  33 

subject  to  a.,  p,  =  0.  i=l,...,m 

•  *  J  J  ' 


all  p.  >  0 

J 

-The  q.  and  are  constants.  The  q^  may  be  hypothesized  values  and  the 
0^  sample  statistics;  or  vice  versa.  In  either  case,  the  implied  null 
hypothesis  is  Ho:p=q,  i.e.  that  the  observed  and  expected  figures  are 
not  distinguishable. 

-Denote  I*(p:q)  as  the  solution  for  which  p=p*,  the  minimizing  cho-'ce  of 
p.  Then  2NI*(p:q)  (N  is  the  sample  size)  is  asympototical ly  distributed 
as  chi-square,  with  degrees  of  freedom  depending  on  n  and  on  the  number 
of  linearly  independent  constraints  (see  Gokhale  and  Kullback  (1978), 
Phillips  (1980)). 
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MDI  and  the  Loglinear  (Multiplicative)  Model 


Using  the  method  of  Lagrange  (undetermined)  multipliers,  Gokhale 
and  Kullback  (1978)  prove  that  the  solution  of  the  canonical  MDI  problem 
leads  to  the  loglinear  representation  of  the  estimates  p*: 

J 

p*  m 

-  ln  "j  •  ,n  qf  *  Xi  au 

where  the  a*  are  determined  to  satisfy  the  constraints.  This  is 

the  loglinear  model  as  described  in,  e.g.  Bishop,  Fienberg  and  Holland  (1975). 

--The  p^  in  this  loglinear  model  automatically  sum  to  one,  since  this  is  a 
J 

condition  of  the  MDI  problem. 

--Gokhale  and  Kullback  stress  that  this  loglinear  model  of  the  p^  is  a 

consequence  of  the  MDI  formulation,  and  is  not  derived  from  seemingly 

arbitrary  assumptions  of  convenience  as  in,  e.g.  Jones  and  Zufryden  (1978). 

--The  term  ln  q.  in  the  log-odds  equation  is  not  merely  a  constant  of 
J 

fit.  Its  meaning  is  implied  by  the  derivation  of  I (p : q )  via  Bayes' 
theorem  in  the  earlier  section. 

--An  exponential  transformation  of  both  sides  of  the  loglinear  equation 
yields  the  multiplicative  equation  variant. 


(v  1  m  xTai  j 

1  =  qj6XP  \h  Aiaij)=  qJ  i=i6 


--The  loglinear  model  has  application  in  several  aspects  of  marketing 
research  (Green,  Carmone  and  Wachspress  (1977));  in  transportation 
research  (Oum  (1979),  Phillips  (1978),  McFadden  (1973));  in  representing 
production  functions  (Charnes,  Cooper  and  Schinnar  (1976));  and  physical 
systems  (see  Phillips  (1978)).  Later  sections  of  this  paper  link  the 
loglinear  model  to  logit  and  MCI  models,  where  further  marketing 


applications  are  cited. 


f 


The  Dual  Convex  Programming  Form 
of  the  Constrained  MDI 


Charnes,  Cooper  and  Seiford  (1978)  proved  the  complete  mathematical 
programming  duality  theory  for  constrained  MDI  estimation,  in  terms  of  the 
following  dual  problems: 


pri  mal 

sup  vU)  s  -E*, 


dual 

inf  <(z)  E^Lc-jei^2  -  bTz 
i 


AT<5  =  b 


z  unconstrained. 


6  >0 


--Here  -,-A  denotes  ith  row  of  A. 

--The  duality  state  of  interest  comprehends  the  conditions: 

(i)  a  feasible  a  exists  with  every  ai  >  0; 

(ii)  £(z)  has  a  minimum  at  z  and  v(a)  has  a  unique  maximum  at  a  ; 

(iii)  4(z *)  =  v(a*);  and 

(iv)  a*  =  cieiAz.* 

See  Brockett,  Charnes  and  Cooper  (1978)  for  a  complete  statement.  Note  that 
condition  (iv)  is  the  multiplicative  (loglinear)  model  of  the  estimates  6*  . 
— The  estimates  are  easily  computed  by  minimizing  the  unconstrained  convex 
function  4(z)  then  transforming  z*  to  i*  by  means  of  formula  (iv). 
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MDI  and  the  Entropy  Model 

Max-entropy  models  have  become  well-known  in  transportation 
research  and  more  recently  in  marketing  research  (see  Herniter  (1973, 

1974);  and  Phillips  (1978)  for  additional  references).  The  discrete 
entropy  model  involves  maximizing  the  "entropy"  function  of  a  distribution 

P;  n 

Maximize  H(p)  =  -  H  p.  In  p. 

j  =  l  J  J 

subject  to  linear  constraints.  It  is  easily  seen  that  H(p)  finds  its 
extremum  at  the  same  point  as  does  I ( p : q )  if  we  let  q.  =  1/n  for  j=l,...,n: 

J 

n 

I ( P : q )  =  £  P<  ln(p.n) 
j  =  l  J  J 

=  Z  Pj  1n  Pj  +  ln  n 
=  -H(p)  +  (constant). 

H(p)  and  I(p:q)  are  therefore  measures  of  the  deviation  of  p  from 
a  discrete  uniform  distribution  over  n  points;  however  in  this  regard  the 
entropy  function  is  clearly  a  special  case  of  the  discrimination  information 
statistic. 

--The  MDI  is  more  qeneral  and  offers  greater  flexibility,  since 
the  nul 1 -hypothesis  function  q  can  represent  any  probability  function, 

(not  just  a  uniform  distribution). 

--The  MDI  theory  has  a  complete  and  rigorous  foundation  in  statistics,  so  we 
need  not  be  troubled  by  non-rigorous  analogies  from  thermodynamics--as  is 
so  often  the  case  with  "entropic"  models  (see  Phillips  (1978),  Haynes, 
Phillips  and  Mohrfeld  (1980)). 
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MDI  and  the  Logit  Model 

The  logit  model  is  a  special  case  of  the  log! inear  model  (see 
Green,  Carmone  and  Wachspress  (1977,  p.  56)).  The  logit  uses  a  linear 
function  of  several  independent  variables  to  represent  the  log-odds 
of  the  occurrence  of  an  event  E,  given  a  value  of  x. 

ln  l-P(E)  =  ^  ^iXi 


We  consider  below  the  dichotomous  case  (occurrence/nonoccurrence  of 
E)  with  one  independent  variable.  An  example  from  Berkson  (1972)  and 
Gokhale  and  Kullback  (1978)  uses  the  following  four  samples  under  different 
values  of  x: 

Sample  #  Value  of  x  Sample  Size  #  of  Successes 


1  0  10 

2  1  10 

3  2  10 

4  3  10 


40 


1 

6 

3 

_8 

18 


Transform  to  a  contingency  table  representation: 


x 

0 

1 

2 

3 


JL 

1 

2 

3 

4 


Success  (j=l) 

1 

6 

3 

8 


£  18 


Failure  (j=2) 

9 

4 

7 

2 

22 


Solve  the  max-entropy  problem: 

Max  H(p . . )  =  -  Z  I  p. .  In  p 


U 


TJ 


U 


_£ 

10 

10 

10 

10 


4 
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subject  to: 


2 

E 

j=i 


ij 


=  10 


E  P„  -  >8 

i 

E  Vn  -36 


i=l,2,3,4 


The  logl inear  model  results: 


1"  pj;  =  XjZ*  +  z|  +  zf  ; 

ln( l-p*j )  ■  In  p*2  =  z* 


This  implies 


Z6  X1  +  z5  ' 


The  logl  inear  representations  of  the  remaining  p^,  pi2  have  the  same 
respective  coefficients,  thus 


In 


±m*i 


1-P ( E  |  x ) 


=  Z*X  +  Z t 


as  required  in  the  simple  logit  model,  and  P(E )x )  is  given  by  the  logistic 
cumulant  function 


P ( E | x )  =  [l+e'(z6x  +  z5)]'1 


We  now  consider  the  MDI  extension  of  this  max-entropy-to-logit 
sequence.  If  above  we  replace  Max  H(p,  .)  by  Min  I(p..:q..),  the  resulting 

'  J  *  J  *  J 

log-odds  are 

In  JyU-  -  In  -£JJ-  +  z*  *  z*x. 

'-Pfj  ‘-’ij  5  6 

Evidently  if  the  prior  log-odds  In  ( q . . / 1-q . i )  is  a  linear  function  of  x, 

I  J  I  J 

then  ln(p.  ./1-p.  .)  will  also  be  a  linear  function  of  x.  Otherwise,  the 

I  J  l  J 

resulting  representation  will  constitute  a  nonlinear  generalization  of  the 
1 ogi t  model . 


The  trichotomous  univariate  case  (three  alternatives,  one  independent 


variable)  can  be 

handled 

simi 1 arly. 

Again 

using  the 

contingency 

table 

representation 

j: 

l 

2 

3 

£_ 

X 

i_ 

0 

1 

1 

9 

5 

15 

1 

2 

7 

2 

6 

15 

2 

3 

1 

3 

11 

15 

3 

4 

8 

2 

5 

15 

£  : 

17 

16 

27 

we  minimize  I  (p^  j  :  j  ) 


subject  to 


i  »  1,2, 3,4 


x i Pj  i  -  33 
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Plj  >  0 


- 


t 
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Taking  ratios  of  the  loglinear  representations  as  we  did  in  the 
dichotomous  case,  we  have  that  for  every  value  of  x, 


Pi  Ol  ,  ★  *  *  * 

In  -i  =  In  —  +  (z7  -  z8)x  +  (z5  -  zg); 

P2  q2 


In  + 
<13 


★ 

z7x 


+  z  c 


and 


★ 


★  * 

Z8X  +  z6  ‘ 


The  three  inverse  logits  are  obtained  by  reversing  signs,  since  In  x/y  is 
equal  to  -In  y/x. 


We  illustrate  one  more  case  below,  that  of  the  dichotomous  logit 
model  with  two  independent  variables  x  and  y.  For  notational  clarity  we 
allow  x  to  take  on  four  values  and  y  three  values  in  this  example,  which 


can  therefore  be  visualized  as  a  three-way  table  times 

alternative  j  is  chosen  when  x  =  x^  and  y  =  y^j.  In  the  MDI  constraint  set 
below,  we  have  replaced  the  right-hand-sides  with  symbols,  since  explicit 
values  are  not  necessary  for  the  lonit  derivation. 


Ml"  KPtjk:  q,jk) 

2 

L- 

II 

**”> 

Q. 

WE 

1  =  1,2, 3, 4 

k  -  1,2,3 

(sample  size  under  each  of  the  12 
(x,y,)  combinations) 
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t 

i  =  I 


i  lk 


=  r 


13 


(total  number  of  observed  choices 
of  the  first  alternative) 


(expected  value  of  x  given  that  the  first 
alternative  is  chosen) 


(expected  value  of  y  given  that  the  first 
alternative  is  chosen) 


Then,  for  any  i  and  k. 


pilk  ,  pi lk  ,  qi lk  * 


1"  D  =  1nfn"il>  =  +  +  ziaxi  +  Z 

pi2k  1  Pi lk  qi2k  15  k  14  1 


13 


that  is.  *2*4Xt 


•13 


—  It  is  straightforward  to  combine  the  latter  two  cases  for  a  general 
polytomous  (multinomial)  multivariate  logit  model,  e.g.  the  one  given  by 
Theil  (1969): 
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p . 

log  pj  =  («i  -  «j)  +  £yk  [log  yki  -  log  ykjJ  +  E  (ehi  -  ehj)log  xh. 


In  Theil's  model  the  y  variables  are  levels  of  attributes  of  the  choice  objects 
(e.g.  price,  nutrition,  %  distribution,  advertising  exposure,  etc.),  and  the 
x  variables  are  measures  of  consumer  characteristics  (e.g.  income,  number 
of  children,  purchase  histories,  etc.).  This  is  consonant  with  the  MDI 
derivation  of  the  logit  model. 

--McFadden  (1973)  presents  a  utility-derived  variant  of  the  multi¬ 
nomial  multivariate  logit  which  gives  a  "conditional  logit"  expression  for 
the  choice  odds  given  the  consumer  characteristics.  The  conditional  choice 
probabilities  in  McFadden 's  model  can  be  separated  and  written  as 


pi 


=  P(xi|s)  * 


k  k  k  /  k 
i  z*e' A  x 

ek.l  ,k-l 


zkok 

J 


where  s  is  a  vector  of  consumer  characteristics  and  2^  zi°  is  the 

k-1 

|( 

linear  "utility  function"  of  s-type  consumers  for  alternative  i.  The  o 
are  unknown.  We  see  that 


P. 

i 


k 

n 

k=l 


k 

n 

k-l 


k  k 

,zj0 


k 

n 

k=l 


k 

n 

k=l 
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k  z* 

under  the  transformation  yj  =  e  1  .  This  shows  McFadden's  conditional  logit 
to  be  identical  to  the  "MCI"  model  dealt  with  in  a  later  section  of  this 
paper,  and  estimable  in  the  same  manner  using  the  MDI  method. 

--Theil' s  empirical  use  of  the  logit  model  and  McFadden's  utility  -  theoretic 
derivation  are  both  to  be  contrasted  to  the  MDI  approach,  in  which  the 
logit  expressions  follow  from  the  solution  of  a  structural,  or  "process" 
model,  which  explicitly  represents  all  of  the  known  information  relevant  to 
the  choice  situation. 

Our  logit  examples  should  make  it  apparent  that  the  form  of  the 

logit  or  loglinear  representation  depends  on  the  structure  of  the  MDI 

constraints,  which  depend  in  turn  on  the  kind  of  information  that  is 

available  for  constructing  the  model.  However,  it  is  often  the  case  that  a 

given  set  of  structural  (or  policy)  conditions  can  be  represented  by  many 

★ 

distinct  but  equivalent  sets  of  linear  equations.  Thus  within  the  MDI 
format  as  elsewhere,  the  available  information  may  suggest,  but  will  not 
determine ,  the  form  of  the  estimation  model.  Recasting  the  constraints  in 
the  above  examples,  for  instance,  may  produce  some  of  the  logit  variants 
mentioned  by  Oum  (1979).  We  will  not  pursue  this  possibility  here. 

*  -  . 
even  by  many  equivalent  sets  of  linearly  independent  linear 

equations,  in  many  cases--al thouqh  linear  independence  of  the  constraints 
is  not  a  prerequisite  for  an  MDI  solution  under  the  Charnes-Cooper 
theory. 

The  substitution  of  equivalent  constraint  sets  will  not  affect 
the  value  of  I  (p:q)  for  a  given  problem  (Gokhale  and  Kullback  (1978)), 
but  may  affect  the  significance  values  of  individual  constraints  (see 
Phillips  (1980)  for  a  detailed  discussion). 


-It  is  possible  that  this  more  comprehensive  framework  for  the  logit  model 
will  resolve  some  of  the  problems  that  arise  in  its  application  (see 
e.g.  Oum  (1979)). 

-Marketing  applications  of  logit  models  are  due  to  Green,  Carmone  and 
Wachspress  (1977),  Jones  and  Zufryden  (1978,  1979),  Flath  and  Leonard 
(1979)  and  McFadden  (1973).  The  latter  uses  the  "conditional"  logit  in 
a  more  general  context  of  choice  behavior. 
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MDI  and  the  Probit  Model 

(Dichotomous  case  with  one  independent  variable) 

This  simple  probit  model  involves  nothing  more  than  fitting  a  normal 
distribution  function  (Hanushek  and  Jackson  (1977)).  For  a  given  value  of  x, 
the  probit  model  represents  the  probability  of  occurrence  of  an  event  E  as 

P(  E  |  X )  =  4>(0x)  r  P(Y  <  gx) 

where  Y  is  a  standard  normal  variate. 

The„  UjP(E|  x)  =  </?(gx)  where  is  the  standard  normal  density  function 

We  entertain  the  hypothesis  HQ:  P(E|x)  =  *(gx). 

It  is  sufficient  to  test 

H0:^p(Elx)  =^(Bx). 

Suppose  we  observe  a  parameter  0  =  /t(x)  dP(E|x),  and  solve  the  MDI: 

Min  f  f(x)  In  dx 

-co  <p(Bx)  (★) 

oo 

s.t.  /  T(x)  f(x)  =  0  , 

— ao 

where  we  let  f(x)  =  |j— P(E|x).  In  Khinchin's  (1957)  terminology,  the 

f*( x )  solving  this  problem  is  the  "conjugate  distribution"  of  g? ( px ) . 

Kullback's  (1959)  theorem  on  the  MDI  inequality  implies  that  if 


M(z)  =  /  ezT(xV(0X)x(dx)  exists  on  an  interval,  and  if  f*(x) 

problem  (*),  then  (and  only  then)  we  have 


zT(x) 

f*U)  =  - RXz7 


e“'"'  <l(gx) 


and 


ljf*(x):  ^(6x)J  =  I*  =  ez  -  In  M(z),  where  e  =  ~\r\  M(z). 


For  example,  let  6  =  E(x).  Then  T(x)  =  x,  and  for  problem 


the  theorem  impl ies 


f*(x) 


exz  |  (2tt)-’2  e-x2/2[ 
(27r)_l2  J e-*2/2  exz  dx 


e-x2/2+xz 
/e-x2/2+xz  dx 


Completing  the  square  in  the  exponents. 


f*(x) 


e~^(x-z)2  e&2 
e’^JV^x- z)Z  dx 


e-S(x-z)2 

■\/2n 


which  is  the  normal  density  function  with  mean  z  and  unit  variance, 
estimate  of  P( E | x)  is 


*  ^(sx  -  z)  ; 


solves 


(*)  above, 


Thus  our 


i.e.  the  conjugate  distribution  of  a  normal  distribution  is  another  normal 
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distribution.  We  may  generalize  for  other  0.  The  parameter(s)  0  may  be 
assumed  observable  in  the  same  manner  as  in  the  discussion  of  the  logit  model. 
--Applications  of  logit  and  probit  models  have  been  hindered  by  problems 
such  as  the  "independence  of  irrelevant  alternatives"  assumption  (McFadden 
(1973),  Hausman  and  Wise  (1973)).  One  way  of  stating  this  assumption 
(McFadden  (1973))  is  that  following  the  introduction  of  a  new  alternative 
into  the  choice  set,  the  new  share  of  old  alternative  i,  p*?ew,  should  be 
equal  to  (l-mjp?^,  where  m  is  the  stable  share  of  the  new  alternative  and 
p°^  was  the  share  of  alternative  i  prior  to  the  introduction  of  the  new 
alternative.  If  it  is  possible  to  sample  the  distribution  of  choices 
both  before  and  after  this  introduction,  the  irrelevant  alternatives 
assumption  can  be  tested  via  MDI  or  the  Pearson  chi-square.  The  advantages 
of  MDI  in  this  regard  were  intimated  by  Theil  (1969).  The  work  of 
Charnes,  Cooper,  Learner  and  Phillips  (1980)  is  also  relevant  for 
determining  whether  an  alternative  is  vulnerable  to  share  loss  to  another 
member  of  the  universe  of  choices.  The  question  of  relevant  alternatives 
changes  with  the  product  life  cycle  and  the  point  of  view  of  the  investigator. 
During  the  growth  period  of  carbonated  soft  drinks'  share  of  the  total 
beverage  market,  coffee  was  the  alternative  of  interest  in  studies  conducted 
by  soft  drinks  trade  associations  (Woodruff  and  Phillips  (1974)).  These 
days,  with  a  stable  category  share,  switching  studies  are  sponsored  by 
individual  manufacturers,  and  focus  on  preference  shifts  within  the  soft 
drink  category. 

--For  applications  of  probit  models  relevant  for  marketing,  see  Hausman  and 
Wise  (1978)  and  Hanushek  and  Jackson  (1977).  See  also  Daganzo  (1979). 


MO I  and  MCI 


Nakanishi  and  Cooper  (1974)  set  forth  the  "Multiplicative  Competitive 
Interaction"  model  of  individual  choice: 


_  IN. 

j 

as  a  generalization  of  an  earlier  choice  model  due  to  Huff  (1962).  The 
probability  of  choosing  alternative  j  is  given  as  a  (normalized)  product 
of  terms.  Each  term  reflects  the  amount  of  attribute  k  carried  by  alternative 
j,  raised  to  the  exponent  for  attribute  k. 

We  now  derive  this  model  via  an  entropic  principle. 


Max  -  E  vi  In  v. 

j  J  0 


s-1-  ^Vkj  3  Tk 

J 


Vj  >  0  V  j 

This  problem  is  stated  in  terms  of  a  frequently  purchased  consumer 
good,  e.g.  a  packaged  food,  vj  is  the  number  of  pounds  (or  packages)  of  brand 
j  purchased  in  a  specified  time  interval,  x^j  is  now  taken  to  be  the  amount 
of  attribute  k  per  package  or  per  pound  of  brand  j.  T^  represents  the  total 
amount  of  attribute  k  consumed  during  the  interval  by  the  population  under 


f 
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study.  (This  quantity  can  be  enumerated  from  consumer  panel  data  for  certain 
kinds  of  attributes. ) 

At  optimum,  we  have  (see  the  earlier  discussion  of  loglinear  models): 
v*  .  exp  [&kj  zj]  =  „  e*kj4 

»  "  (eXW)Z‘ 

=  77 
k 


★ 

where  the  zk  are  dual  evaluators,  and  in  the  last  expression  we  have 
substituted  ykj  for  eXkj. 


It  is  then  straightforward  that  the  probability  Prob  |a  package 
purchased  will  be  a  package  of  brand  j  f  can  be  written 


pj  * 


Pi 

j 


U  vll 

k  yk.i 
1/7 

i  k  KJ 


which  is  the  MCI  model. 

--Extensions  can  be  made  in  the  MDI  constraint  set  to  accomodate  attributes 
that  cannot  be  expressed  on  a  per-pound  or  per-package  basis. 

--McFadden's  conditional  logit  model,  mentioned  in  an  earlier  section, 
illustrates  the  relationship  between  the  logit  model  and  the  MCI  model. 
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MDI  and  SANDDABS 

SANODABS  is  a  model  which  has  been  used  for  more  than  fifteen  years 
at  the  Market  Research  Corporation  of  America  for  estimating  shifts  in  market 
size  and  brand  preferences.  A  SANDDABS  analysis  begins  at  the  level  of  the 
individual  household.  This  stage  of  the  analysis  can  be  represented  by  the 
tableau  below  (which  may  contain  any  number  of  rows  and  columns): 


ABC 
A 

B 

C 

Period  II:  P^  P^ 


dAA 

CO 

V 

6 

d 

6 

BA 

BB 

BC 

A  . 

6 

6 

°CA 

CB 

CC 

Period  I 


?Pi  =  ?Pi 


The  margins  of  the  tableau  usually  represent  the  volumes  of  brands 
A,  B,  C,...  purchased  by  a  given  household  in  two  periods  of  equal  length 
(although  they  may  for  different  purposes  represent  units  purchased,  purchase/ 
non-purchase  indicators,  or  other  units).  In  practice,  rows  and  columns 
are  added  to  the  tableau  to  reflect  changes  in  the  total  category  volume 
sold.  In  this  way,  SANDDABS  can  allocate  brand  shifting  volumes  to  market 
size  change,  and  vice  versa;  and  the  sum  of  the  tableau  row  sums  equals  the 
sum  of  the  column  sums. 


A  household's  repeat  purchase  of  a  brand  is  reasonably  the  minimum 
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5  j  =  p* 
i  i  j  J 
i^j 

an  a, .  >  o 

•  J 

The  proof  is  based  on  a  demonstration  that  the  traditional  solution 
*  1  '1 

=  PjPj/  r  Pj  produces  equal  values  for  the  primal  and  dual  MDI  functionals. 

This  is  a  sufficient  condition  for  optimality. 

Phillips  (1978)  set  down  the  asymptotic  distribution  for  the  Kullback 
information  number  associated  with  the  SANDDABS  summary  matrix  (the  sum  of 
all  household  matrices). 

These  developments  showed  first  of  all  that  the  SANDDABS  procedure 
had  implicit  "underlying  optimizations",  and  secondly  that  SANDDABS  could 
be  used  as  a  flexible  hypothesis  testing  tool,  using  the  asymptotic  theory 
of  the  associated  MDI  number. 

SANDDABS  thus  comprehends  the  ability  to  constructively  test  issues 
of  general  interest  in  marketing: 

--Are  brand  shares  stationary  over  a  given  period  of  time? 

--Have  switching  patterns  changed  over  time? 

--Is  switching  proportional  to  brand  share? 

--Is  a  given  brand  partitioning  scheme  statistically  valid? 

--Is  a  given  consumer  segmentation  scheme  statistically  valid? 

See  Chames,  Cooper,  Learner  and  Phillips  (1980)  and  Learner  and  Phillips  (1979) 
for  further  discussion  and  examples. 
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MDI  and  the  Hendry  Model 

The  Hendry  system  is  a  proprietary  and  little-understood  set  of 
models  for  marketing  management,  developed  by  the  Hendry  Corporation.  Few 
technical  writeups  have  been  released  (see  Hendry  (1971)),  but  some  items 
concerning  the  Hendry  market  segmentation  model  seem  fairly  certain: 

(1)  The  model  is  based  on  a  combinatorial  definition  of  "entropy". 

(2)  Hierarchical  brand  attribute  structures  are  posited  and 
taken  to  correspond  to  a  hierarchical  decision  process  on  the 
part  of  the  consumer. 

(3)  The  latter  structure  is  not  statistically  tested;  in  fact, 
all  of  "Hendrodynamics"  has  a  markedly  deductive  flavor,  but 
from  subjective  postulates. 

(4)  Heavy  emphasis  is  placed  on  a  scalar  "switching  constant" 
which  measures  "intensity  of  competition"  between  brands. 

(5)  Brand  switching  volumes  (pairwise)  are  assumed  to  be  proportional 
to  the  product  of  the  brands'  shares,  in  an  "equilibrium" 
situation. 

The  information  theoretic  marketing  models  developed  by  the  current 
authors  have  been  detailed  elsewhere  (Charnes,  Cooper  and  Learner  (1978); 
Phillips  (1978);  and  Charnes,  Cooper,  Learner  and  Phillips  (1980)).  In 
addressing  the  Hendry  models,  we  begin  by  reiterating  the  superior  flexibility 
of  MDI  over  max-entropy  for  representing  estimated  quantities  relative  to 
a  hypothesized  or  baseline  state  of  affairs.  Further,  the  flexible  hypo¬ 
thesis  testing  capability  of  the  MDI  method  allows  goodness-of-fit  tests 
of  the  market  segments  and  structures  suggested  by  Hendry,  among  others. 
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Charnes,  Cooper,  Learner  and  Phillips  (1980)  reinterpreted  the 
Hendry  switching  constant  as  a  vulnerability  ratio,  and  provided  a  method 
for  simultaneous  determination  of  the  ratios  (in  contrast  to  the  trial- 
and-error  method  given  in  Kalwani  and  Morrison's  (1977)  interpretation  of 
Hendry).  Following  this,  the  same  authors  developed  an  information  theoretic 
test  for  the  validity  of  a  product  segmentation  based  on  the  vulnerability 
ratio. 

This  was  the  first  statistical  test  known  to  the  authors  of  any 

of  the  consequences  of  the  Hendry  approach.  The  test  involved  characterizing 

the  apparent  segmentation  (given  by  the  Charnes-Cooper-Learner-Phill ips 

algorithm  (1980))  by  a  set  of  linear  constraints  on  a  variable  switching 

matrix  [p  •].  Given  an  observed  switching  matrix  [q..],  minimizing  I (p :q ) 

13  I J 

subject  to  the  constraints  constituted  a  test  of  whether  the  segment 
structure  was  consistent  with  the  observed  switching.  The  test  will  be 
detailed  in  a  future  report.  * 


*Charnes,  Cooper,  Learner,  and  Phillips  (1980a). 
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Further  Applications  of  MDI  in  Marketing 

A  recent  article  of  Jones  and  Zufryden  (1978)  combined  a  logit 
model  of  brand  choice  with  a  negative  binomial  purchase  frequency  hypo¬ 
thesis  to  produce  a  components-of-sales  model  for  consumer  package  goods. 

The  bulky  parameter  estimation  apparatus  attached  to  this  model  shows 
marked  contrast  to  the  simultaneous  and  easy  solution  of  the  MDI  estimate-- 
for  which  the  logit  representation  is  a  built-in  consequence. 

The  information  theoretic  "MARK-IT"  model  (Phillips  (1978)) 
estimates  the  joint  distribution  of  three  components  of  brand  sales  within 
a  product  category:  brand  loyalty  (i.e.,  probability  of  brand  choice), 
purchase  frequency,  and  transaction  size  (lbs.).  In  the  original  develop¬ 
ment  of  MARK-IT,  the  greatest  emphasis  was  given  to  management  interpretation 
of  the  model;  however,  the  present  work  makes  it  clear  that  a  logit  model 
of  brand  choice  can  follow  directly  from  MARK-IT,  and  that  the  capability 
resides  in  MARK-IT  for  testing  any  distributional  hypothesis  concerning 
transaction  frequency  or  transaction  size. 

Tests  of  other  marketing  questions  (including  other  aspects  of 
the  Hendry  system,  etc.),  seem  readily  possible  with  MDI  procedures.  Charnes, 
Cooper,  Learner  and  Phillips  (1980)  bring  forth  that  the  large-sample 
multi  normality  of  multinomial  brand  purchase  proportions  should  result  in 
the  appearance  of  "switching  proportional  to  brand  share"  whether  or  not 
"equilibrium"  is  present.  With  additional  data  such  as  SANDDABS  (described 
in  an  earlier  section)  one  can  test  hypotheses  by  MDI  as  tests  in  contingency 
tables  (Gokhale  and  Kullback  (1978)). 

See  also  Learner  and  Phillips  (1979)  for  an  exposition  of  MDI 
models  which  stresses  managerial  and  decision  theoretic  issues. 


31 


Gokhale  and  Kullback.  (1978)  explain  procedures  for  testing  nested 
hypotheses  with  MDI,  and  Phillips  (1980)  provides  additional  examples. 
Nested  hypotheses  are  effected  by  adding  or  removing  constraints  in  the 
canonical  MDI  problem;  the  associated  information  values  and  degrees  of 
freedom  are  additive  and  can  be  displayed  in  an  "Analysis  of  Information" 
table.  The  sequential  procedure  is  both  convenient  and  meaningful --as 
mentioned  earlier,  I(p:q)  is  a  general  distance  measure  and  not  merely  a 
test  statistic,  and  so  even  when  an  hypothesis  is  rejected,  the  MDI  pro¬ 
cedure  will  determine  the  best  alternative  hypothesis. 
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Conclusion 

We  have  now  covered  a  variety  of  topics  in  sufficient  detail  to 
suggest  how  the  information  statistic--including  its  extensions  to 
constrained  optimi zations--could  be  used  to  unify  different  parts  of 
market  research.  This  includes  topical  areas  such  as  brand  switching  and 
individual  choice  models  as  well  as  location  and  traffic  flows.  For 
methodological  unification  we  have  shown  how  such  topics  as  logit  and 
probit  analysis,  with  corresponding  regressions,  can  be  accorded 
information  theoretic  interpretations  and  uses.  Other  points  of  contact 
were  also  indicated  with  topics  such  as  market  segmentation  and  preference 
analysis  and,  of  course,  still  others  could  also  be  developed  in  detail 
and  the  same  applies  to  other  methodologies  besides  the  ones  covered  in 
this  paper. 

Much  remains  to  be  done,  of  course,  in  identifying  limits  to  the 
unifying  power  of  these  approaches  as  well  as  in  establishing  more 
rigorously  the  contacts  we  have  already  indicated.  En  route  to  the 
indicated  unification,  we  should  also  be  able  to  benefit  from  improved 
abilities  to  deal  with  different  classes  of  marketing  problems. 

Particular  attention  is  called  to  the  joining  of  mathematical 
programming  to  information  theory  which  was  illustrated  in  contexts  such 
as  brand  switching  and  consumer  choice  analysis.  We  did  not,  however, 
examine  the  possible  further  uses  of  these  extensions  to  comprehensive 
market  planning  and  policy  and  control  evaluations.  Even  within  the  limits 
of  the  separation  of  statistical  analyses  from  manaaerial  planning  models 
that  have  been  customary  in  marketing--but  not  in  mathematical  programming-- 
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we  have  also  indicated  some  new  possibilities.  One  of  these  involves  the 
possible  use  of  constraints  to  deal  with  issues  such  as  the  assumed 
"independence  of  irrelevant  alternatives"  that  has  proved  awkward  to 
treat  in  other  approaches  such  as  multi-dimensional  scaling.  We  also 
indicated  how  the  problem  of  statistically  testing  nested  hypotheses  can  be 
treated  by  constraint  adjunction  and  elimination.  The  hierarchical 
marketing  structures  of  the  marketing  literature  (Kalwani  (1979).  Urban, 
Johnson  and  Brudnick  (1979))  can  also  be  treated  similarly  and  tested  step 
by  step  instead  of  being  only  subjectively  posited  and  re-posited  as  at 
present. 

These  constraint  possibilities,  on  the  other  hand,  raise  new 
problems  for  statistical  research  and  for  mathematical  (computational) 


research  as  well . 
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